ab null
Generalization in multitask deep neural classifiers: a statistical physics approach
We would first like to thank all three reviewers for their thorough, constructive and considered reviews. Appendix A, our model is a nonequilibrium variant of Derrida's Random Energy Model. We will update the final manuscript to describe this analogy more explicitly. As such, this is still a matter of active research. Conditions claimed in L181-184: We will amend the manuscript to indicate that the equation directly preceding eqn.
Bayes optimal learning of attention-indexed models
Boncoraglio, Fabrizio, Troiani, Emanuele, Erba, Vittorio, Zdeborová, Lenka
We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in modern attention architectures.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)